Vocabulary Reduction in BoW Representing by Topic Modeling
نویسندگان
چکیده
In this work, a new approach to vocabulary reduction is presented. It is based on filtering words in the topic feature space instead of directly in the original word space. The main goal is to analyze the differences between the application of the Cumulative Count-based word filter (fcc) in word feature space (BoW: Bag of Words) with respect to its application in topic descriptions (obtained by LDA: Latent Dirichlet Allocation). Three well-known text datasets (Reuters, WebKB and NewsGroup) have been used to show the performance of the proposed approach.
منابع مشابه
Image classification by visual bag-of-words refinement and reduction
This paper presents a new framework for visual bag-of-words (BOW) refinement and reduction to overcome the drawbacks associated with the visual BOW model which has been widely used for image classification. Although very influential in the literature, the traditional visual BOW model has two distinct drawbacks. Firstly, for efficiency purposes, the visual vocabulary is commonly constructed by d...
متن کاملImproving Performances of BoW-based Image Retrieval by Using Contextual Keypoint Descriptors
The paper reports an improved method of content-based image retrieval using a well-known method of bag-of words (BoW). Words built over descriptors of popular affine-invariant keypoint detectors (Harris-Affine and Hessian-Affine are exemplary choices) are used. What is novel, however, is the number of descriptors (i.e. the number of words) representing individual keypoints. Instead of SIFT (or ...
متن کاملPersonalized Multi-Document Summarization using N-Gram Topic Model Fusion
We consider the problem of probabilistic topic modeling for query-focused multi-document summarization. Rather than modeling topics as distributions over a vocabulary of terms, we extend the probabilistic latent semantic analysis (PLSA) approach with a bigram language model. This allows us to relax the conditional independence assumption between words made by standard topic models. We present a...
متن کاملFast Visual Vocabulary Construction for Image Retrieval Using Skewed-Split k-d Trees
Most of the image retrieval approaches nowadays are based on the Bag-of-Words (BoW) model, which allows for representing an image efficiently and quickly. The efficiency of the BoW model is related to the efficiency of the visual vocabulary. In general, visual vocabularies are created by clustering all available visual features, formulating specific patterns. Clustering techniques are k-means o...
متن کاملBuilding an Enhanced Vocabulary of the Robot Environment with a Ceiling Pointing Camera
Mobile robots are of great help for automatic monitoring tasks in different environments. One of the first tasks that needs to be addressed when creating these kinds of robotic systems is modeling the robot environment. This work proposes a pipeline to build an enhanced visual model of a robot environment indoors. Vision based recognition approaches frequently use quantized feature spaces, comm...
متن کامل